Word Alignment without NULL Words
نویسندگان
چکیده
In word alignment certain source words are only needed for fluency reasons and do not have a translation on the target side. Most word alignment models assume a target NULL word from which they generate these untranslatable source words. Hypothesising a target NULL word is not without problems, however. For example, because this NULL word has a position, it interferes with the distribution over alignment jumps. We present a word alignment model that accounts for untranslatable source words by generating them from preceding source words. It thereby removes the need for a target NULL word and only models alignments between word pairs that are actually observed in the data. Translation experiments on English paired with Czech, German, French and Japanese show that the model outperforms its traditional IBM counterparts in terms of BLEU score.
منابع مشابه
Fast Collocation-Based Bayesian HMM Word Alignment
We present a new Bayesian HMM word alignment model for statistical machine translation. The model is a mixture of an alignment model and a language model. The alignment component is a Bayesian extension of the standard HMM. The language model component is responsible for the generation of words needed for source fluency reasons from source language context. This allows for untranslatable source...
متن کاملImproving IBM Word Alignment Model 1
We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM train...
متن کاملProAlign: Shared Task System Description
ProAlign combines several different approaches in order to produce high quality word word alignments. Like competitive linking, ProAlign uses a constrained search to find high scoring alignments. Like EM-based methods, a probability model is used to rank possible alignments. The goal of this paper is to give a bird’s eye view of the ProAlign system to encourage discussion and comparison. 1 Alig...
متن کاملImproving bilingual alignment models: Cognate identification, length dependence, and phrases
Determining exactly how words in a French sentence correspond to their counterparts in an English translation is an essential component of a machine translation system. For example, given the sentences Je suggère que tu arrives à l’heure and I suggest you arrive on time, we might hope to align je to I, suggère to suggest, and so forth. The IBM Models use pure distributional statistics to determ...
متن کاملYou'll Take the High Road and I'll Take the Low Road: Using a Third Language to Improve Bilingual Word Alignment
While language-independent sentence alignment programs typically achieve a recall in the 90 percent range, the same cannot be said about word alignment systems, where normal recall figures tend to fall somewhere between 20 and 40 percent, in the language-independent case. As words (and phrases) for various reasons are more interesting to align than sentences, we need methods to increase word al...
متن کامل